Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge. This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge, by conducting large-scale knowledge probing experiments of 10 models and 4 augmentation methods on PopQA, our new open-domain QA dataset with 14k questions. We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the tail. We then show that retrieval-augmented LMs largely outperform orders of magnitude larger LMs, while unassisted LMs remain competitive in questions about high-popularity entities. Based on those findings, we devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary. Experimental results show that this significantly improves models' performance while reducing the inference costs.
translated by 谷歌翻译
问题回答(QA)对知识库(KBS)的挑战是充满挑战的,因为所需的推理模式多样化,本质上是无限的,类型的推理模式。但是,我们假设以大型KB为基础,以回答各自子图中各个实体的查询类型所需的推理模式。利用不同子图的本地社区之间的这种结构相似性,我们引入了一个半参数模型(cbr-subg),(i)一个非参数组件,每个查询,每个查询,都会动态检索其他类似的$ k $ - $ - $ - $ - near-neart-tebrienk(KNN)培训查询以及查询特定的子图和(ii)训练的参数组件,该参数分量可以从KNN查询的子图中识别(潜在的)推理模式,然后将其应用于目标查询的子图。我们还提出了一种自适应子图收集策略,以选择特定于查询的compact子图,从而使我们可以扩展到包含数十亿个事实的完整freebase kb。我们表明,CBR-SUBG可以回答需要子图推理模式的查询,并在几个KBQA基准上的最佳模型竞争性能。我们的子图收集策略还会产生更多紧凑的子图(例如,webQSP的尺寸减小55 \%,而将答案召回的召回率增加4.85 \%)\ footNote {代码,模型和子码头可在\ url {https://github.com上获得。 /rajarshd/cbr-subg}}。
translated by 谷歌翻译
At the foundation of scientific evaluation is the labor-intensive process of peer review. This critical task requires participants to consume vast amounts of highly technical text. Prior work has annotated different aspects of review argumentation, but discourse relations between reviews and rebuttals have yet to be examined. We present DISAPERE, a labeled dataset of 20k sentences contained in 506 review-rebuttal pairs in English, annotated by experts. DISAPERE synthesizes label sets from prior work and extends them to include fine-grained annotation of the rebuttal sentences, characterizing their context in the review and the authors' stance towards review arguments. Further, we annotate every review and rebuttal sentence. We show that discourse cues from rebuttals can shed light on the quality and interpretation of reviews. Further, an understanding of the argumentative strategies employed by the reviewers and authors provides useful signal for area chairs and other decision makers.
translated by 谷歌翻译
从头开始解决复杂问题通常是有挑战性的,但如果我们可以访问其解决方案的其他类似问题,则更容易 - 一种称为基于案例的推理(CBR)的范式。我们提出了一种神经象征性的CBR方法(CBR-KBQA),用于在大知识库上应答。 CBR-KBQA由非参数内存组成,该内存存储案例(问题和逻辑表单)和参数模型,该参数模型可以通过检索与其相关的案例来为新问题生成逻辑表单。在包含复杂问题的几个KBQA数据集上,CBR-KBQA实现了竞争性能。例如,在ComplexWebQuestions数据集上,CBR-KBQA以11 \%的准确度优于当前最新状态。此外,我们表明CBR-KBQA能够使用新案例\ EMPH {没有}任何进一步的培训:通过在案例存储器中纳入一些人类标记的示例,CBR-KBQA能够成功地生成包含未经看线KB实体的逻辑表格以及关系。
translated by 谷歌翻译
Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES
translated by 谷歌翻译
The usage of technologically advanced devices has seen a boom in many domains, including education, automation, and healthcare; with most of the services requiring Internet connectivity. To secure a network, device identification plays key role. In this paper, a device fingerprinting (DFP) model, which is able to distinguish between Internet of Things (IoT) and non-IoT devices, as well as uniquely identify individual devices, has been proposed. Four statistical features have been extracted from the consecutive five device-originated packets, to generate individual device fingerprints. The method has been evaluated using the Random Forest (RF) classifier and different datasets. Experimental results have shown that the proposed method achieves up to 99.8% accuracy in distinguishing between IoT and non-IoT devices and over 97.6% in classifying individual devices. These signify that the proposed method is useful in assisting operators in making their networks more secure and robust to security breaches and unauthorized access.
translated by 谷歌翻译
We consider the problem of constructing minimax rate-optimal estimators for a doubly robust nonparametric functional that has witnessed applications across the causal inference and conditional independence testing literature. Minimax rate-optimal estimators for such functionals are typically constructed through higher-order bias corrections of plug-in and one-step type estimators and, in turn, depend on estimators of nuisance functions. In this paper, we consider a parallel question of interest regarding the optimality and/or sub-optimality of plug-in and one-step bias-corrected estimators for the specific doubly robust functional of interest. Specifically, we verify that by using undersmoothing and sample splitting techniques when constructing nuisance function estimators, one can achieve minimax rates of convergence in all H\"older smoothness classes of the nuisance functions (i.e. the propensity score and outcome regression) provided that the marginal density of the covariates is sufficiently regular. Additionally, by demonstrating suitable lower bounds on these classes of estimators, we demonstrate the necessity to undersmooth the nuisance function estimators to obtain minimax optimal rates of convergence.
translated by 谷歌翻译
The devastation caused by the coronavirus pandemic makes it imperative to design automated techniques for a fast and accurate detection. We propose a novel non-invasive tool, using deep learning and imaging, for delineating COVID-19 infection in lungs. The Ensembling Attention-based Multi-scaled Convolution network (EAMC), employing Leave-One-Patient-Out (LOPO) training, exhibits high sensitivity and precision in outlining infected regions along with assessment of severity. The Attention module combines contextual with local information, at multiple scales, for accurate segmentation. Ensemble learning integrates heterogeneity of decision through different base classifiers. The superiority of EAMC, even with severe class imbalance, is established through comparison with existing state-of-the-art learning models over four publicly-available COVID-19 datasets. The results are suggestive of the relevance of deep learning in providing assistive intelligence to medical practitioners, when they are overburdened with patients as in pandemics. Its clinical significance lies in its unprecedented scope in providing low-cost decision-making for patients lacking specialized healthcare at remote locations.
translated by 谷歌翻译
Neural Architecture Search (NAS) is an automatic technique that can search for well-performed architectures for a specific task. Although NAS surpasses human-designed architecture in many fields, the high computational cost of architecture evaluation it requires hinders its development. A feasible solution is to directly evaluate some metrics in the initial stage of the architecture without any training. NAS without training (WOT) score is such a metric, which estimates the final trained accuracy of the architecture through the ability to distinguish different inputs in the activation layer. However, WOT score is not an atomic metric, meaning that it does not represent a fundamental indicator of the architecture. The contributions of this paper are in three folds. First, we decouple WOT into two atomic metrics which represent the distinguishing ability of the network and the number of activation units, and explore better combination rules named (Distinguishing Activation Score) DAS. We prove the correctness of decoupling theoretically and confirmed the effectiveness of the rules experimentally. Second, in order to improve the prediction accuracy of DAS to meet practical search requirements, we propose a fast training strategy. When DAS is used in combination with the fast training strategy, it yields more improvements. Third, we propose a dataset called Darts-training-bench (DTB), which fills the gap that no training states of architecture in existing datasets. Our proposed method has 1.04$\times$ - 1.56$\times$ improvements on NAS-Bench-101, Network Design Spaces, and the proposed DTB.
translated by 谷歌翻译
Deep learning-based object detection is a powerful approach for detecting faulty insulators in power lines. This involves training an object detection model from scratch, or fine tuning a model that is pre-trained on benchmark computer vision datasets. This approach works well with a large number of insulator images, but can result in unreliable models in the low data regime. The current literature mainly focuses on detecting the presence or absence of insulator caps, which is a relatively easy detection task, and does not consider detection of finer faults such as flashed and broken disks. In this article, we formulate three object detection tasks for insulator and asset inspection from aerial images, focusing on incipient faults in disks. We curate a large reference dataset of insulator images that can be used to learn robust features for detecting healthy and faulty insulators. We study the advantage of using this dataset in the low target data regime by pre-training on the reference dataset followed by fine-tuning on the target dataset. The results suggest that object detection models can be used to detect faults in insulators at a much incipient stage, and that transfer learning adds value depending on the type of object detection model. We identify key factors that dictate performance in the low data-regime and outline potential approaches to improve the state-of-the-art.
translated by 谷歌翻译